Refactor benchmark workflow #52

andreab1997 · 2022-10-19T10:02:11Z

As @alecandido suggested, I am refactoring the workflow in order to download the files needed for benchmarks directly from the server (https://data.nnpdf.science/)

codecov-commenter · 2022-10-19T12:33:06Z

Codecov Report

Merging #52 (7ac0cbe) into main (4b1d511) will decrease coverage by 42.07%.
The diff coverage is 30.95%.

Additional details and impacted files

@@             Coverage Diff             @@
##             main      #52       +/-   ##
===========================================
- Coverage   88.51%   46.44%   -42.08%     
===========================================
  Files          16       17        +1     
  Lines         592      618       +26     
===========================================
- Hits          524      287      -237     
- Misses         68      331      +263

Flag	Coverage Δ
bench	`?`
unittests	`46.44% <30.95%> (-0.36%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/pineko/check.py	`83.72% <0.00%> (-9.31%)`	⬇️
src/pineko/comparator.py	`18.18% <0.00%> (-72.73%)`	⬇️
src/pineko/ekompatibility.py	`21.42% <21.42%> (ø)`
src/pineko/evolve.py	`25.45% <25.00%> (-63.44%)`	⬇️
src/pineko/theory.py	`19.76% <25.00%> (-65.03%)`	⬇️
src/pineko/cli/check.py	`55.76% <50.00%> (-38.47%)`	⬇️
src/pineko/cli/convolute.py	`66.66% <50.00%> (-7.25%)`	⬇️
src/pineko/theory_card.py	`16.00% <0.00%> (-68.00%)`	⬇️
... and 7 more

andreab1997 · 2022-10-19T12:53:49Z

This should be ready but I believe we need to install wget on the container, right? @alecandido

felixhekhorn · 2022-10-19T12:56:22Z

as @alecandido said: please separate a script that you call in the workflow, but that one can run even with out (i.e. locally)

andreab1997 · 2022-10-20T08:48:54Z

As you can see, the tests after ebf6342 work. This is because I have just used apt-get install wget in the workflow (and of course this is not what we want). However, at least we know that the only thing that we miss is to install wget in the container.

alecandido · 2022-10-20T09:37:52Z

However, at least we know that the only thing that we miss is to install wget in the container.

Just to spread the knowledge about how containers are made:

you should go here https://github.com/N3PDF/workflows/blob/main/containers/bench-evol/Containerfile and add wget to the list of packages to be installed, in the second stage
you should follow the instructions here https://github.com/N3PDF/workflows/blob/main/containers/README.md to build and upload the new image

But don't do it, there is already curl installed, so replace wget by curl (curl main purpose is making HTTP requests, so it is not downloading files by default, but you can with a couple of options).

alecandido · 2022-10-20T09:39:30Z

In particular, use:

curl -o output.html http://google.com/

https://stackoverflow.com/a/9360357/8653979

alecandido · 2022-10-20T09:42:08Z

Ah no, wait. My bad, you are using recursive download, so you need wget.

Keep going with build and upload, as explained in the instructions above.

Also, move the download_test_files.sh top-level. It is useful also for local testing, no need to hide :)

andreab1997 · 2022-10-20T09:44:47Z

Ah no, wait. My bad, you are using recursive download, so you need wget.

Keep going with build and upload, as explained in the instructions above.

Also, move the download_test_files.sh top-level. It is useful also for local testing, no need to hide :)

Yes, I was already struggling to do recursive download with curl :)

andreab1997 · 2022-10-20T09:54:13Z

2. https://github.com/N3PDF/workflows/blob/main/containers/README.md

Anyway I believe I need some to provide the github token in some way because if I do podman pull ghcr.io/n3pdf/bench-evol:latest I get the error Error: initializing source docker://ghcr.io/n3pdf/bench-evol:latest: Requesting bearer token: invalid status code from registry 403 (Forbidden)

alecandido · 2022-10-20T10:21:25Z

# login to GitHub registry with user credentials (not organization), see [3]
echo ${PAT} | podman login ghcr.io -u <username> --password-stdin

[...]
[3]: official GitHub registry docs

As specified in the linked docs, you should give a PAT of yours the write:packages scope.

P.S.: it is not the $GITHUB_TOKEN of the workflow, that's a different thing, it is a PAT (Personal Access Token)

andreab1997 · 2022-10-20T12:02:12Z

# login to GitHub registry with user credentials (not organization), see [3]
echo ${PAT} | podman login ghcr.io -u <username> --password-stdin
[...]
[3]: official GitHub registry docs
As specified in the linked docs, you should give a PAT of yours the write:packages scope.

P.S.: it is not the $GITHUB_TOKEN of the workflow, that's a different thing, it is a PAT (Personal Access Token)

Ok thanks, now it works. However, maybe the install.sh script is broken for lhapdf. I have been trying for a while but it still fails. The error seems to be g++: fatal error: Killed signal terminated program cc1plus compilation terminated. make[2]: *** [Makefile:557: liblhapdf_yaml_cpp_la-emitterutils.lo] Error 1
Any clues?

alecandido · 2022-10-20T12:15:03Z

No, wait, I'm stupid. I pointed you to the bench-evol container, but here you are using the lhapdf one...

andreab1997 · 2022-10-20T12:21:12Z

No, wait, I'm stupid. I pointed you to the bench-evol container, but here you are using the lhapdf one...

Here in bench.yml I am using bench-evol that is why I am trying to build it. Should I use something else? Anyway I need lhapdf for the benchmarks

alecandido · 2022-10-20T12:23:59Z

If you just need lhapdf, and not APFEL or Pegasus or QCDNUM, then use the more restricted lhapdf container (same repo).

However, I tried to run install.sh and I had no issue. Try yourself to do it in a fresh environment (create with virtualenv env and just activate), and (if still not working) try on a Linux machine.

andreab1997 · 2022-10-20T12:37:18Z

If you just need lhapdf, and not APFEL or Pegasus or QCDNUM, then use the more restricted lhapdf container (same repo).

However, I tried to run install.sh and I had no issue. Try yourself to do it in a fresh environment (create with virtualenv env and just activate), and (if still not working) try on a Linux machine.

But actually the install script fails on the container (so after having done ./build.bash bench-evol) which is a Linux machine, right?

alecandido · 2022-10-20T12:56:09Z

g++: fatal error: Killed signal terminated program cc1plus compilation terminated. make[2]: *** [Makefile:557: liblhapdf_yaml_cpp_la-emitterutils.lo] Error 1 Any clues?

Killed signal terminated program is telling you that something killed it, like when you press Ctrl-C...
Compiled on dom, and it is just working.

alecandido · 2022-10-20T13:23:47Z

Ok, I actually have the new image on dom, but I can't upload it, because of the following error:

Error: writing blob: initiating layer upload to /v2/n3pdf/bench-evol/blobs/uploads/ in ghcr.io: denied: permission_denied: write_package

Unfortunately, GitHub just introduced fine-grained PAT, and my classic token, with write:packages permission, it is not enough to publish a package to the @N3PDF organization.
Moreover, I can't issue a fine-grained token with access to N3PDF/workflows, because it is not allowed by @N3PDF organization.

Both (classic and fine-grained) can be allowed to perform actions on @N3PDF, but @scarrazza or @scarlehoff has to do it explicitly at:
https://github.com/organizations/N3PDF/settings/personal-access-tokens-onboarding

(I'm not an admin of @N3PDF, so I can't do it myself)

alecandido · 2022-10-22T10:10:14Z

Ok, now I've done everything I could: we also have a releasing workflow, such that packages can be built in the CI and released automagically.
https://github.com/N3PDF/workflows/blob/v2/.github/workflows/publish-containers.yml

But it is still failing:
https://github.com/N3PDF/workflows/actions/runs/3302837504
with the usual error:

Error: writing blob: initiating layer upload to /v2/n3pdf/lhapdf/blobs/uploads/ in ghcr.io: denied: permission_denied: write_package

So, not even the ${{ github.token }} (or ${GITHUB_TOKEN}, is just the same) is able to release, here the default permissions:
https://docs.github.com/en/actions/security-guides/automatic-token-authentication#permissions-for-the-github_token

Therefore, it is really up to @N3PDF admins, since the Default access (permissive) has the write:packages scope, this means it has been somehow set to Default access (restricted).

Here the guide on how to control permissions for the ${GITHUB_TOKEN}.
https://docs.github.com/en/actions/security-guides/automatic-token-authentication#modifying-the-permissions-for-the-github_token

The moment it is fixed, we can get the new container just by adding a tag, or rerunning the last workflow.

Now, I can't do anything more for this PR, it is totally up to @scarrazza and @scarlehoff

alecandido · 2022-10-22T10:24:52Z

Actually, I can't experiment not being an admin, so I don't know what is on the other side, but maybe is also a GitHub problem.

In particular, if you read the output of the first section, "Set up job", there is a dedicated item to "GITHUB_TOKEN Permissions", and this is the output:


GITHUB_TOKEN Permissions
  Actions: write
  Checks: write
  Contents: write
  Deployments: write
  Discussions: write
  Issues: write
  Metadata: read
  Packages: write
  Pages: write
  PullRequests: write
  RepositoryProjects: write
  SecurityEvents: write
  Statuses: write

But consider that 8 hours ago these people (the one providing the push action I'm using) were able to release with an extremely similar workflow:
https://github.com/redhat-actions/push-to-registry/actions/runs/3301376932/jobs/5446774905

alecandido · 2022-10-22T15:11:17Z

Latest update: everything still holds for new packages, I have no idea on how to publish them if the tokens are not accepted.

However, for existing packages of which I'm the owner I could solve it. So, not bench_evol, since @niclaurenti is the owner, but I can instruct him on how to do it.

But here the dependency is only lhapdf, and for that there is an lhapdf container I could publish (since I'm the owner of that). bench_evol has been tailored for EKO benchmarks, providing APFEL, QCDNUM, Pegasus, and so on.

So, the solution was to switch benchmark.

Now it is still failing because of the dependency on lz4, since it requires to build the wheel and gcc is not available in the container. But this is in the range of things that have a straight solution :)
(now it is more a problem that there are more than one, so I will try to pick the best, but at some point any is fine to begin with)

alecandido · 2022-10-23T10:21:50Z

Problem with lz4 solved on EKO side, now it is fine.

Current problem is that I had to release the version with Composite Output struct, so the old output.Output is not any longer there.
Two solutions:

we revert the version of EKO, and we merge with the failing workflow (then open a PR to update the dependency and the EKO API usage)
we update immediately here

I believe it should not require a major rework to use the new structure, it's just a matter of updating the usage here and there. I would go for 2.

P.S.: now it is not a workflow/containter problem any longer, even if there is something fishy going on from the container side (but EKO isobench are passing...)

felixhekhorn · 2022-10-23T10:40:43Z

Current problem is that I had to release the version with Composite Output struct, so the old output.Output is not any longer there.

for that exact reason: can you please drop the new releases on the 0.10 branch and commit instead a 0.11 version? Otherwise we run into lot's of trouble all over the place, because poetry thinks they are compatible, which they are not

alecandido · 2022-10-23T10:42:29Z

I actually thought the were compatible, I didn't remember I dropped the struct. However, you can keep going and reshuffle the tags :)

felixhekhorn · 2022-10-23T10:48:26Z

I think a tag can not be renamed, can it? (that wouldn't make sense to me ...) and, I guess, the problem is PyPI (since poetry will look there), right?

alecandido · 2022-10-23T10:53:22Z

With "reshuffle tags" I mean:

delete the tag from the repo (actually, from v0.10.3 to v0.10.5 included)
yank the release from PyPI (there is only 0.10.5, the other workflows failed)
release v0.11.0, and it will automatically generate the new version on PyPI

felixhekhorn · 2022-10-23T11:02:03Z

ok, I believe to have done ...

andreab1997 · 2022-10-24T07:42:55Z

So, what is missing here? I see that the workflows are failing for the reason you mentioned, are you taking care of this or should I do something?

alecandido · 2022-10-27T18:22:13Z

Just rebased on main. There was a conflict just with the renaming q2_grid -> muf2_grid and similar.

andreab1997 · 2022-10-28T09:02:37Z

I am reviewing it now :)

src/pineko/evolve.py

andreab1997 · 2022-10-28T09:17:39Z

Everything seems fine now (of course to be also tested on an actual FK computation). Anyway tests are passing and I don't see anything wrong. So I am asking your review again I then I believe we can merge. Thanks for this! @alecandido @felixhekhorn

andreab1997 · 2022-10-28T09:18:25Z

@felixhekhorn I have added you at least you know what it is happened :)

felixhekhorn added enhancement New feature or request refactor Refactor code labels Oct 20, 2022

andreab1997 mentioned this pull request Oct 20, 2022

Fix alpha value scheme C #53

Merged

alecandido and others added 19 commits October 27, 2022 20:18

Standardize usage of github token

4f77f37

Update EKO and pineappl dependencies

8ddaf53

Update eko version to 0.11

3778e1d

Fix some compatibility problem

8750279

Update EKO to v0.11.1

d763111

Fix check grid and eko compatible

4e59c1f

Fix coupling problem

51f6a9d

Add None to argument of evolve_grid in convolute

e01e95e

Fix convolute

9f736db

Add test for new info in operator card

428eef7

Add test for eko computation

9623bd3

Add compatibility layer to account for new EKO layout

fb6a25c

Convert xgrid objects to raw numpy arrays

5cce067

Fix wrong access for eko version, now an attribute

2aac6cb

Update operator card to new format

393a877

Start fixing missing thcard

6b4d63b

Fix opcard subcommand

d35fef7

Fix bench evolve

b5eb520

Propagate theory card requirement to TheoryBuilder

1cd3c83

alecandido force-pushed the refactor_workflow branch from 44a1a1f to 1cd3c83 Compare October 27, 2022 18:21

andreab1997 commented Oct 28, 2022

View reviewed changes

src/pineko/evolve.py Show resolved Hide resolved

andreab1997 requested review from alecandido and felixhekhorn October 28, 2022 09:17

alecandido approved these changes Oct 28, 2022

View reviewed changes

alecandido merged commit 85ca767 into main Oct 28, 2022

alecandido deleted the refactor_workflow branch October 28, 2022 09:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor benchmark workflow #52

Refactor benchmark workflow #52

andreab1997 commented Oct 19, 2022

codecov-commenter commented Oct 19, 2022 •

edited

Loading

andreab1997 commented Oct 19, 2022

felixhekhorn commented Oct 19, 2022

andreab1997 commented Oct 20, 2022

alecandido commented Oct 20, 2022

alecandido commented Oct 20, 2022

alecandido commented Oct 20, 2022 •

edited

Loading

andreab1997 commented Oct 20, 2022

andreab1997 commented Oct 20, 2022

alecandido commented Oct 20, 2022 •

edited

Loading

andreab1997 commented Oct 20, 2022

alecandido commented Oct 20, 2022

andreab1997 commented Oct 20, 2022

alecandido commented Oct 20, 2022

andreab1997 commented Oct 20, 2022

alecandido commented Oct 20, 2022

alecandido commented Oct 20, 2022 •

edited

Loading

alecandido commented Oct 22, 2022

alecandido commented Oct 22, 2022

alecandido commented Oct 22, 2022

alecandido commented Oct 23, 2022 •

edited

Loading

felixhekhorn commented Oct 23, 2022

alecandido commented Oct 23, 2022

felixhekhorn commented Oct 23, 2022

alecandido commented Oct 23, 2022

felixhekhorn commented Oct 23, 2022

andreab1997 commented Oct 24, 2022

alecandido commented Oct 27, 2022

andreab1997 commented Oct 28, 2022

andreab1997 commented Oct 28, 2022

andreab1997 commented Oct 28, 2022

Refactor benchmark workflow #52

Refactor benchmark workflow #52

Conversation

andreab1997 commented Oct 19, 2022

codecov-commenter commented Oct 19, 2022 • edited Loading

Codecov Report

andreab1997 commented Oct 19, 2022

felixhekhorn commented Oct 19, 2022

andreab1997 commented Oct 20, 2022

alecandido commented Oct 20, 2022

alecandido commented Oct 20, 2022

alecandido commented Oct 20, 2022 • edited Loading

andreab1997 commented Oct 20, 2022

andreab1997 commented Oct 20, 2022

alecandido commented Oct 20, 2022 • edited Loading

andreab1997 commented Oct 20, 2022

alecandido commented Oct 20, 2022

andreab1997 commented Oct 20, 2022

alecandido commented Oct 20, 2022

andreab1997 commented Oct 20, 2022

alecandido commented Oct 20, 2022

alecandido commented Oct 20, 2022 • edited Loading

alecandido commented Oct 22, 2022

alecandido commented Oct 22, 2022

alecandido commented Oct 22, 2022

alecandido commented Oct 23, 2022 • edited Loading

felixhekhorn commented Oct 23, 2022

alecandido commented Oct 23, 2022

felixhekhorn commented Oct 23, 2022

alecandido commented Oct 23, 2022

felixhekhorn commented Oct 23, 2022

andreab1997 commented Oct 24, 2022

alecandido commented Oct 27, 2022

andreab1997 commented Oct 28, 2022

andreab1997 commented Oct 28, 2022

andreab1997 commented Oct 28, 2022

codecov-commenter commented Oct 19, 2022 •

edited

Loading

alecandido commented Oct 20, 2022 •

edited

Loading

alecandido commented Oct 20, 2022 •

edited

Loading

alecandido commented Oct 20, 2022 •

edited

Loading

alecandido commented Oct 23, 2022 •

edited

Loading